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The effect of different move sets on the folding kinetics of the Monte Carlo simulations is analysed 
based on the conformation-network and the temperature-dependent folding kinetics. A new scheme 
of implementing Metropolis algorithm is given. The new method is shown to satisfy the detailed 
balance and converge efficiently towards thermal equilibrium. A new quantity, employed from 
the continuous time Monte Carlo method, is introduced to identify effectively the kinetic traps of 
foldings. 
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The problem how proteins fold, in milliseconds to sec- 
onds, into unique and stable structures with definite bio- 
logical functions has recently become intriguing to the 
biophysiciansQ. The kinetic feature of such problem 
amounts to the characteristics of the folding paths. Sub- 
ject to this, considerable progress have been achieved 
through numerical studies of lattice heteropolymers in 
two or three dimensions^, 0, 3- Although oversimpli- 
fied, characteristic features obtained from the simula- 
tions, such as folding funneljH, |(J, folding bottleneck^] 
and kinetic traps 0,0], have provided much insights to the 
kinetic process. However, there exists some suspicions 
about the simulation method^ |j| and few ambigui- 
ties in the move sets adopted for the simulations pi Il0| . 
For the former, question about implementing the kinetic 
Monte Carlo algorithm was raised and some implemen- 
tations were shown to violate the condition of detailed 
balance^. For the latter, Chan and Dill^j and Hoang 
and Cieplak^l stressed the strong dependence of the 
folding landscape on the choice of move sets. Conse- 
quently, further clarification and improvement for the 
methodology remain essential. In Ref.0, propos- 
als of refining the Monte Carlo algorithm were discussed. 
In addition, move steps other than the conventional ones, 
such as the snake move [ljj, have also become of interest. 

In this Letter we attempt a systematic analysis for the 
move sets in relation to the configuration space and de- 
velop a new method of implementing the Metropolis al- 
gorithm jl ll| appropriate for the problem class of protein 
foldings. While the conformational networks a ppe ar to 
be a natural embedding of the folding paths |l2l . fl3| . 
the move sets as the sets of generators of connections 
can correlate with the geometry of such setting. Subse- 
quently, the identification of the configuration space en- 
ables a straightforward program for the new Monte Carlo 
implementation. We demonstrate its reliability as well 
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as accuracy and efficiency by means of explicit computa- 
tions. Then, the continuous time Monte Carlo (CTMC) 
method[lJ, 113 equipped with the new schem is used to 
study the folding kinetics associated with different move 
sets. Interestingly, by invoking CTMC simulations, it 
is found more general to label the kinetic traps with the 
new quantity, called viscosity-factor, in contrast to by the 
conventional local energy minimum since entropy effects 
are also taken into account. 

A specific model system is used to carry out the study. 
Based on the considerations given byRef. [3 , we con- 
sider the H-P modele on a 2D square latticejlq with 
the " protein- like" sequence of 16 monomers specified 
as PHPPPHPPHPPPPHHP. As the consequence 
of assigning the contact energies with Eh,h = —3.3, 
Eh.p = —2 and Epp = —1, the energy of the na- 
tive state, shown in the inset of Fig. 1(b), appears 
to be Eg = —13.3. In addition, the mean energy gap 
AE g = 1.97 is obtained by averaging out all the energy 
differences between the lowest 10 excited states and the 
native one. The specific-heat curve C v (T) possesses a 
single peak, representing the phase transition from the 
molten globule to the native state, as well as a shoulder 
at a higher temperature, signifying the transition from 
random coil to molten globule. Moreover, the folding 
temperature Tf, defined as the temperature at which 
the probability of the native conformation is 1/2, is at 
Tf = 0.50 which is near to the specific-heat peak. 

We are concerned with move sets consisting of typi- 
cal move steps, say the end flip (e/), corner shift (cs), 
crankshaft (cr) and the rigid rotation (rr) 3]. The con- 
ventional move set Si only allows for ef, cs and cr based 
on locality. However, the ergodicity is shown to fail for 
<Si[3- To be concrete, in two dimensions the move set Si 
prohibits the reaching of one conformation from the other 
ones for the folding with 16 monomers and, considerably, 
the number of such conformations rapidly increases for 
more monomers and/or dimensions. The problem can be 
remedied by involving moves of rr type which have been 
realised in some simple diffusive motions for groups of 
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monomers |'i I 1 8| . While ef itself can be viewed as short- 
scale rigid rotation, an ergodic move set, say 52, can be 
achieved by simply combining ef with rr. Finally, in- 
volving all the four types, an 53 move set is also known 
to be ergodic. 

Different move sets generate different conformation- 
networks for which, each conformation is a node and 
two nodes are connected by an edge if and only if the 
two can be transformed to each other via an elementary 
move of the move set[(| 0, 0, |2(]]. The nodes for the 
conformation-network are unambiguously given as long 
as the ergodicity is fulfilled. For the case at hand, the 
number of monomers M = 16 and the total node number 
K = 802, 075. On the other hand, the edge distributions 
among nodes remain sensitive to the choice of the move 
sets. Heuristically, few measures for the edge distribu- 
tions are shown to provide the global properties of the 
conformation networks as folding in the high tempera- 
ture limit. 

Define the number of edges ki associated with any node 
i by means of one elementary move. The correspond- 
ing mean values per node (k) are 9.2, 20.7 and 26.3 for 
Si, S2 and 53, respectively. The (k) value for S2 or S3 
is twice more than the value for Si- Thus, the larger 
(k) value provides more through-way accessibility to the 
native state and reduce the possibility of being trapped 
in local minimum. The minimal edge number l± for a 
node i to connect with the native state is also of interest. 
Upon enumerating this for the model sequence with re- 
spect to various move sets and taking average, we obtain 
(I) = 26.4 for Si and 8.4 for both S 2 and S3. The (/) 
value for S2 or S3 is about one-third of that for Si . More 
thorough informations are provided by the distributions 
P (k) and P(l), where P (k) equips each node with the 
probability of accessing k nodes and P (I) corresponds to 
the probability associated with minimal edge number I 
between a node and the native state. In Fig. 1(a), the 
results of P (k) are given with respect to different move 
sets. In Fig. 1(b), the P(l) curves develop very nar- 
row widths of the distributions for S2 and S3, sharply 
contrasting to the case for Si- The results for P (k) and 
P (I) assure us that, at high temperature, the kinetic fea- 
tures of S*2 and S3 are similar but are drastically different 
from that of Si. Meanwhile, for finite temperatures, the 
structure peculiar to problem setting remains important. 

The method of Monte Carlo simulation is constrained 
by the condition of detailed balance held on the transition 
between any of two connected nodes, say from c k to c m , 

Peg (Cfc) W (Cfc -» Cm) = P eq (c m ) W (c m -> Cfc) , (1) 

where P eq (cfc) is the equilibrium probability for the 
conformation c k , and the transition probability rate 
W (cfc — > c rn ) can be factorized as 

W (c fe -> Cm) = p(c k ^ Cm) ■ P acc (c fe -> Cm) , (2) 

with the probability of applying the correspond- 
ing update p(c k —>c m ) and the acceptance rate 



P acc (ck^c m ). For the randomly chosen target 
node c m from the k Ck connected nodes, we have 
p(ck— > c m ) = I/fee,,- While the acceptance ratio is 
R = P acc {c k -> Cm) /P acc {c m c fc ), the condition of 
detailed balance of Eq. and the factorized form of 
W (cfc — > c m ) of Eq. lead to 

R = r k , m exp [- {E Cm - E Ck ) /T] , (3) 

with r k>m = k Ck /k Cm , where the Boltzmann constant is 
set to unity and T is the temperature. Consequently, 

P aCC (c k ^Cm) = Y f Jl , (4) 

the acceptance rate depends on both the energy differ- 
ence and the edge numbers of two nodes. Therefore, our 
scheme bears interesting correspondence to the embed- 
ding structure of the conformation-network. This is dif- 
ferent from the method suggested by Collet who em- 
ployed p (cfc — * c m ) = 1/ (2M — 5) but adopted r k , m = 1 
for the acceptance ratio R of Eq. © above of the basis 
of the move set S\. Noteworthy is also the conventional 
scheme equipped with randomly updating Cfc — ► c m and 
Tk,m — 1 may violate the condition of detailed balance^]- 
Subsequently, the reliability as well as the efficiency of 
the algorithm and the dynamic features with respect to 
move sets are of concern. 

Consider the standard deviation of probability dis- 
tribution from the equilibrium probability, D (t) — 

(Efcii(^e 9 (cfc)-n( C fc,i)) 2 /^) 1/2 , where II(c fe ,i) is 
the occurrence probability of the state c k during the time 
steps t. By generating the Monte Carlo trajectories of 300 
billion steps at T = 2.50 starting from the completely 
extended chain, the values of D (i) can be sampled at 
each 100, 000 Monte Carlo steps. The results depicted in 
Fig. 3 reveal interesting issues. First of all, as seen from 
Fig. 2(a), Monte Carlo simulations with randomly up- 
dating Cfc — > c m and r k .m = 1, which violates the detailed 
balance, cannot converge toward thermal equilibrium [jj. 
Relative to this, employing the non-ergodic set Si plays 
minor role for the unachievable thermal equilibrium since 
only one isolated node exists for M = 16. In addition, 
the Collet's method, for which one spend additional time 
in determining whether an update is applied or not, tends 
to slow down the convergence toward the thermal equi- 
librium in contrast to our new scheme. Meanwhile, Fig. 
2(b) also shows up even more rapid convergence for the 
move sets S2 and S3 by means of our new scheme. But 
the choice of move sets appears to be insensitive to the 
asymptotic behaviours of the standard deviation of prob- 
ability distribution, the corresponding scaling rule reads 
then D (t) ~ i -1 / 2 . Thus, we conclude the reliability and 
efficiency for simulations based on our scheme. The price 
to pay is that for each update the edge numbers of the 
present and target nodes have to be known. 

For analyzing the perspectives of folding kinetics in 
relation to move sets, we refer to the CTMC method 
proposed by Gillespie 14]. The CTMC method can be 
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equipped with our scheme, and it suggests that the prob- 
abibility of the transition — * c rn is 



P (Cfc -*• Cr, 



W (Cfc -> C m ) 

W s (c fc ) 



(5) 



and the probability of the occurrence of a transition from 
the current node Ck at the time r is 



^ (t) = [W s (c fc )] exp 1-tWs (c fc )] 



(6) 



where VT (cfc — ► c m ) is given by Eq. 10) and W s (c fc ) = 

Sj=i W ( c fe — * c i)- Accordingly, we update the node Cfc 
and determine its time duration r by using two ran- 
dom numbers r\ and ri ranging between and 1. With 
the transition probability of Eq. JSJl, we calibrate the 
transition from Ck to q for < ri < Xi subject 

to X m = YT=i W (°k -> c i) /W s (c fe ). Then, based on 
Eq. © we estimate the time duration r at Cfe by setting 
r 2 = cxp [-tW s (cfc)] to yield r = - In (r 2 ) /Ws (c fe ). 

We first employ the CTMC method described in the 
last paragraph to measure the first passage time tf, de- 
fined as the required time from a stretched chain to the 
first arrival of native state. The mean values, (tf), at 
various temperatures for different move sets are obtained 
by averaging over the results of 1000 simulations. The 
curves of log(f/) versus 1/T are parabolic for all move 
sets. Thence, there exists the fastest folding temperature 
T min with the value 1.74 for Si, and 1.70 for 52 and S3. 
The corresponding (tf) at T min is 1.2E5 for Si, 3.5EA 
for 52, and 2.8EA for 53. Moreover, there exists a linear 
behaviour for log(t/) against 1/T at the region of low 
temperatures, and this gives rise to the Arrhcnius law 
as (tf) ~ exp((_Ef,} /T) with the mean activation energy 
(E b ) = 6.72 for S x , 5.04 for 5 2 , and 4.44 for S3. All 
these results are consistent with the geometric features 
of the conformation- networks, namely that the network 
associated with 52 or 53 owns a larger value of the mean 
number of edges associated with a node and a smaller 
value of the mean minimal edge number between a node 
and the native state in comparison with those for Si . 

The kinetic traps, referred to the states strongly pro- 
hibiting the folding into the native one but locating at 
only few steps away from it, are important for under- 
standing the folding kinetics. Their identifications often 
rely on the distinction of local energy minima without 
taking account of the entropy effect. To make the cri- 
terion more effectively, we introduce the measure, the 
average time duration of a state, in the frame of the 
CTMC method to distinguish obstacle states. The aver- 
age time duration of a state Ck is, from Eq. J^J, given as 
( T c k ) = Jo°° rP Ck (t) dr, and it yields (r Cfc ) = 1/W S (c k ). 
By defining the ratio of (r Cfc ) to the average value of all 
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the viscosity-factor associated with the state Cfc, we can 
signify an obstacle state of foldings by means of r\ > 1. 



The root mean square deviation (RMSD) of the loca- 
tion of a node c& from that of the native state cq is defined 

1/2 



as d 



RMSD 



(cfc) = 



(Cfc) - r t (c )) 2 /M 
where ~fi (cfc) is the position vector for the ith monomer 
of the state Cfc. The d^MSD (cfc) is adopted as the con- 
formational distance of Cfc from the native state. Note 
that the other alternative is to take the minimal num- 
ber of edges between two nodes as the conformational 
distance|3j, but this suffers from the ambiguity owing to 
the choice of move sets. By taking the average of the r) 
values, (rj), of the states associated with the given d^MSD 
as the viscosity-factor of the duMS D > we show the results 
of (rj) versus dnMSD at the fastest folding temperatures 
T min of different move sets in Fig. 3. The results indicate 
that the folding process can be divided into two stages. 
One is the folding from the denatured to the semicom- 
pact states for which, it corresponds to duMSD ^ 2 and 
the viscosity-factor variations among different move sets 
are very small. Owing to the large scale change caused 
by rigid rotation, the folding time for 52 and 53 can 
be expected to be much shorter than that for Si. The 
other stage is the adjustment to the native from the semi- 
compact states corresponding to the range duMSD J$ 2. 
Because of the high (77) values, the searching of the na- 
tive state is a very slow process for all move sets. For a 
given d RM SD < 2, the inequality (r]) S2 > (r}) Ss > (r]) Si 
indicates that the local moves are more effective for the 
searching. The strongest obstacles of foldings for differ- 
ent move sets all occur at d^MSD — 0.5 which contains 9 
conformations. Among these, the state with the largest 
r\ value, shown in the inset of Fig. 3, is the same for dif- 
ferent move sets, and the corresponding r\ value is 21.4 
for Si, 50.9 for 52, and 37.6 for 53. From the numerical 
results of other sequences, we notice that the strongest 
obstacle is not necessary the same for different move sets, 
it may also vary with the sequence; but the inequality 
> r IS3 > Vsi f° r t ne strongest obstacle of a sequence 
is always true. 

In summary, we present a detailed study on the 
methodology of lattice Monte Carlo method, including 
the implementation of the algorithm and the move set 
adopted in the simulations. A new method of implement- 
ing the Metropolis algorithm, which is shown to satisfy 
the condition of detailed balance, is proposed, and the 
characteristic features of different move sets in the fold- 
ing kinetics are given. In particular, we combine the 
CTMC method with the new implementation to intro- 
duce a more effective quantity, called viscosity-factor, to 
identify the kinetic traps. 
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Fig. 1. (a) The distribution function of the edge num- 
ber associated with a node, P (fc), versus the edge num- 
ber, fc; and (b) the distribution function of the distance 
/ from a conformation to the native conformation, P (I), 
versus the distance I for the move sets, Si (circles), S2 
(triangles), and S3 (crosses). The inset shown in (b) is 
the native conformation of the model sequence. 

Fig. 2. The standard deviation of probability distribu- 
tion from the equilibrium probability, D (i), as a function 
of the number of Monte Carlo steps t, at T = 2.50: (a) 
the results of conventional implementation (circles), Col- 
let's method (triangles), and the new proposed method 
(crosses) with the move set 5i ; and (b) the results of the 
new proposed method with the move sets Si (circles), S2 
(triangles), and ^(crosses) with the straight lines given 
by D(t) -t- 1 / 2 . 

Fig. 3. The average value of viscosity- factors, (77), 
versus the distance to the native conformation, cIrmsd, 
at the fastest folding temperatures T m i n for the move sets 
5i (circles) , S2 (triangles), and ^(crosses). The inset 
shows the conformation with the largest rj value for the 
peak in the curve of (77) versus cIrmsd- 
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